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Abstract — To improve the accuracy of image matching shoeprint image feature matching method based on PCA-SIFT is 
proposed. Firstly, feature detection and pre-matching of images are done by using PCA-SIFT ( principal component analysis- 
scale invariant feature transform ) algorithm. And then, the correlation coefficient is used as similarity measurement, which 
can filter image interest points. By this method, the image matching pairs can be obtained. Finally, the RANSAC (random 
sample consensus) algorithm is used to eliminate the mismatching pairs. The simulation results demonstrate that the 
proposed algorithm is more robust while maintaining good registration accuracy when analyzing partial shoeprint images in 
the presence of geometric distortions such as scale and rotation distortions compared with conventional algorithms. 
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I. Introduction 

As a form of physical evidence, a shoemark can provide an important link between the criminals and the place where the 
crime occurred. It has been reported that there should be equal and perhaps even greater chance that footwear impressions 
could be present at a crime scene, compared with the presence of latent fingerprints [1,2,3]. Nonetheless, footwear 
impressions have great potential in assisting forensic investigations. For instance, for a repeat offender who may commit a 
series of offences in a relatively short period of time, it would be unusual to discard or change his/her footwear between 
different crime places [4,5]. 

In this paper, the issue of automatically classifying shoemarks is addressed. A critical issue that has to be overcome in order 
to achieve such a goal is the fact that one may have no control over the quality of the shoemarks collected from Scene Of 
Crime Officers ( SoCs)[6]. As partial shoeprints, resulting from external destruction or incomplete contact between shoe sole 
and the ground surface are commonly found in crime scenes, the performance of a retrieval system for partial prints is of 
considerable interest and importance. As shoeprints left in crime scenes may be incomplete impressions, this makes many 
shape descriptors unsuitable for the application. One of the potential solutions to the problem is extracting features of local 
interest points because such features can still express the pattern even though it is a partial shoeprint [7, 8]. 

In this study, a new matching method is proposed based on PCA-SIFT algorithm [9,10]. Firstly, feature detection and pre- 
matching of images are done by using PCA-SIFT algorithm. Then we applied the matching between the extracting interest 
points descriptor with a nearest neighbor method using the Euclidean distance. Secondly, the mismatching is wiped out by 
using RANSAC algorithm [11,12]. This method solves the mismatching problem of image matching. 

II. The basic theory of the algorithm 
2.1 PCA-SIFT feature point extraction 

PCA-SIFT is a descriptor, which is used in the field of image processing [13,14]. The descriptor has the scale invariant, and 
can detect the feature points of image, which consists of five major stages[15]. 

1. Scale-space peak selection. In this stage, we detect extremes points and get the scale invariant. 

A picture of 2 D image is defined as follow: 

L(x,y,<j) = G(x,y,cr)*I(x,y) ( 1 ) 

Where G(x, y,cr) is a scale variable Gaussian Function: 

G(x,y,<T) = -^e-"'* ),W (2) 

271(7 
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(x, y) is a spatial coordinate, G is scale coordinate. Comparing the sample point with all of its adjacent points, if the point 
is bigger or smaller than its adjacent points in image and scale domain, the sample point is a feature point of the scale. 

2. Unreliable feature points are deleted. Note that low contrast candidates and edge response points along an edge are 
discarded from the feature-point set. 

3. Orientation assignment, A gradient direction histogram is established, the peak of which is behalf of the main direction 
of the feature point in its neighborhood gradient. The direction of the feature point is defined as follow: . 

M (*, y) = VoU + 1 ,y)~ L(X -l,y) 2 + CLU, y+1) - L(x, y - 1)) 2 (3) 

0(x, y) = tan -1 (( L(x , y + 1) - L(x, y - 1)) / (L(x+ 1, y) - L(x - 1, y))) (4) 

Where the model value of feature point is M (x, y ) and is Q(x, y) its direction value. 

4. Key descriptors [13]. According to the Gaussian weighting, feature points coordinates in the field of 16x16 are placed in 
the position of the 4x4, which are weighted to get 8 directions histogram with relative direction of feature points, so a 
description of 128 D is obtained. 

5. The PC A algorithm is used to make the 128 D descriptor down to 20 D or less. 

2.2 The generation of the projection matrix 

When the PC A method was used in image precessing, the generally steps includes: estimate of the autocorrelation matrix 
using the sample; obtain each principal component direction by solving the characteristic equation; the selection of the 
appropriate number of principal component as the new feature of the sample; class the sample after projected to the principal 
component directions [16]. 

The projection matrix should be calculated. As the projection matrix just needs to be calculated only once, if we stored the 
calculated data before the experiment, it can be directly loaded in the experiment. In order to generate the descriptors, a 
certain size area around the feature points required to be selected, and it was called image patch. A series of representative 
images needs to be selected to generate the projection matrix. The computation steps of each feature point using SIFT 
detection includes [17,18]: 

1. Selection of a 41*41 image patch around the feature point, and rotation of the image patch to the direction of feature 
point; 

2. The image patches’ vertical and horizontal gradient computation (the outermost spot does not need to be calculated), 
form a 39*39*2=3042 dimension vector; 

3. The form of a K*3042 matrix A using the vector obtained in step 2, here K refers to the number of feature point 
detected; 

4. Compute B=A-meanA , here meanA refers to the mean matrix constructed by the mean value of each 3042 vector; 

5. The covariance computation of B : co \B=BTB; 

6. The characteristic value and the characteristic vector calculation of the covariance matrix co \B. 

2.3 The generation of PCA-SIFT Descriptor 

PCA-SIFT and SIFT have the same pixel position, scale, and orientation. The difference is that PCA-SIFT use the 41*41 
pixels around the feature point to calculate the principle components when calculating the descriptor. In order to concise 
expression, the original 2*39*39 dimension vector was reduced to a 20 dimension vector using PCA-SIFT [16]. 

In the process of linear projection the high -dimensional data into low dimensional data using PC A, the input vector is the 
horizontal and vertical gradient map of the 41*41 region, whose center is the feature point, the dimension of the input vector 
is 2*39*39=3042. The effect of light can be reduced by normalizing the vector to a unit vector. If spot gradient vector was 
projected to scale space, all kinds of factor will be removed, while the main feature of the vector still retained. The PC A can 
also accurately represent the image characteristics after reducing the dimension [19]. The PCA-SIFT descriptor can be 
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calculated when the computation of projection matrix was completed. In real-time map or the reference image, do the 
following operations for the feature points detected by SIFT [20] : 

1. Select a 41*41 image patch around the feature point, and then rotate the image patch to the direction of feature point; 

2. Compute the image patches’ vertical and horizontal gradient and be normalized (the image patch in the direction of 
up, down, left, and right should not be included), construct a 39*39*2=3042 dimension vector x at last; 

3 . Calculate v , v = x-meanA ; 

4 . Calculate B=A -meanA ; 

5. Calculate the Multiplication of B and the projection matrix, and then the 3042 dimension vector was reduced to n 
dimensional vector. 

2.4 Key points matching 

After we utilize the SIFT algorithm to extract the characteristic points, the PC A algorithm is used to reduce the dimension of 
feature points, and Euclidean distance is compared to similarity measurement. A match is accepted only when its distance is 
less than distance ratio times the distance of the second closest match point. The distance ratio is to control the number of 
matching pairs. When the parameter is adjusted, the most appropriate number of matching pairs can be gotten. Euclidean 
distance is the distance between two real points in space, which is defined as follow: 

dist(X,Y)=l^x^yf (5) 

As the calculation is based on the absolute value of each dimension, Euclidean metric is required to make sure that all 
dimensions are on the same level of scale. While it has an obvious shortcoming. Namely, the different dimension variable is 
equaled to the same, which will cause great deviation. 

In this paper, these descriptions are disposed of the correlation coefficient, which can effectively avoid the dimension 
problem. The correlation coefficient refers to the correlation between the dependent variable and multiple independent 
variables, which is a kind of effective method of similarity measurement. It is defined as follow [21]: 

^(x,-xUy,-y) 

r = ,=1 (6) 

jZ (*,•-*)[ 

v i4 j=i 

2.5 False matching pair elimination 

The RANSAC is the abbreviation of Random Sample Consensus, which is a mathematical model parameter from a set of 
sample data containing abnormal data, we can get effective data sample. It was put forward by Fischler and Bolles firstly in 
1981 [11]. In this paper, The RANSAC algorithm is used to remove mistake due to noise and get the correct matching point 
pairs. 

III. The RESULT OF EXPERIMENT AND ANALYSIS 

In this study, the experiments were conducted to investigate the following issues: performance evaluation of retrieving partial 
shoeprints, such as toe, heel, and left half of outsole, and right half of outsole. Evaluation measurements used in this study is 
the Correct Matching Rate (CMR). 

To simulate partial prints in Scene Of Crime Officers, random quarter prints were selected to build the above four query 
databases, as illustrated in Fig. 1. A shoeprint is divided into its toe and heel parts, which are then divided into left and right 
parts. Each of the above four test databases was built separately. Then a full-size shoeprint is utilized to generate four partial 
shoeprints, each containing c. 50% of a full shoeprint. 
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Fig 1. Test database 

Shoepint image matching method is composed of three parts — feature point extraction, feature point matching with 
correlation coefficient and removing the false match. These three parts combine reasonably, which ensure the correctness of 
image matching, and guarantee the spacecraft to land safely. 

The PCA-SIFT algorithm is used to match above four images. The results are shown table. 1. Table. 1 represents matching 
point pairs of the image named (a) and the image named (al~a4) by PCA-SIFT algorithm. Matching points pairs in the result 
which shows that the algorithm matching rate is high obviously. Experimental result shown in table. 1 gives the results for the 
parts of shoeprints. The experiment shows that the heel prints can perform as well as the toe prints. The four tests in table. 1 
both show that the proposed system can match accurately toe prints and heel prints as well as the left-half prints and right- 
half prints. 


Table 1 

The MATCHING RESULT OF IMAGE NAMED (al~a4) BASED ON PCA- SIFT 


Date set 

partial 

shoeprints 

times(sec) 

matching 
reliability % 

1 

al 

2.72 

95 

2 

a2 

2.23 

84 

3 

a3 

2.32 

92 

4 

a4 

2.52 

94 


The results of simulation are shown in Table 1, including four groups of different parts of shoeprint images. Table 1 
compares image matching rate of shoeprint image with different parts. The similarity metrics -Euclidean distance and 
correlation coefficient are applied to match feature point. According to data in the table, matching rate of the method in this 
paper is higher than that of the traditional method. 

IV. Conclusion 

This work proposed local invariant features and key point matching for recognition and retrieval of shoeprint images. The 
experimental results also show that the performance of matching toe prints and heel prints as well as that of matching left- 
half prints and right-half prints. Correlation coefficient, as one of the similarity measurements, can effectively measure the 
degree of similarity between two samples. In this paper, we combine the PCA-SIFT algorithm with the correlation coefficient 
to get preliminary match pairs of two images. To improve accuracy of matching, the RANSAC algorithm is utilized to delete 
false matching points by noise and duplicate pairs. Finally, we can get one-to-one correct matching point pairs. In order to 
demonstrate the superiority of this method, we show the simulation results. Experimental results show that compared with 
conventional algorithms, the proposed algorithm is more robust while maintaining good registration accuracy when 
analyzing partial shoeprint images in Scene of Crime Officers. 
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